Skip to content

Fix JVM <clinit> deadlock by removing static final accessor fields#48689

Open
jeet1995 wants to merge 1 commit intoAzure:mainfrom
jeet1995:fix/clinit-deadlock-bridge-methods
Open

Fix JVM <clinit> deadlock by removing static final accessor fields#48689
jeet1995 wants to merge 1 commit intoAzure:mainfrom
jeet1995:fix/clinit-deadlock-bridge-methods

Conversation

@jeet1995
Copy link
Copy Markdown
Member

@jeet1995 jeet1995 commented Apr 3, 2026

Summary

Fixes a JVM-level <clinit> deadlock that occurs when multiple threads concurrently trigger Cosmos SDK class loading for the first time. This is a permanent, unrecoverable deadlock that hangs all affected threads indefinitely.

Also fixes a latent CosmosItemSerializer.DEFAULT_SERIALIZER null bug caused by circular <clinit> dependencies between CosmosItemSerializer and DefaultCosmosItemSerializer.

Fixes: #48622, #48585

Root Cause

Deadlock

Consuming classes cached accessors in private static final fields:

private static final FeedResponseAccessor feedResponseAccessor =
    ImplementationBridgeHelpers.FeedResponseHelper.getFeedResponseAccessor();

During <clinit>, the getter finds the accessor null and calls initializeAllAccessors(), which eagerly loads 40+ classes. When two threads enter <clinit> of different classes simultaneously, the JVM's per-class initialization locks create a circular wait — permanent deadlock (JLS §12.4.2).

DEFAULT_SERIALIZER null

CosmosItemSerializer.DEFAULT_SERIALIZER was assigned from DefaultCosmosItemSerializer.DEFAULT_SERIALIZER. When DefaultCosmosItemSerializer.<clinit> ran first (e.g., via INTERNAL_DEFAULT_SERIALIZER access), the recursive <clinit> of CosmosItemSerializer read DefaultCosmosItemSerializer.DEFAULT_SERIALIZER before it was set (JLS §12.4.2 same-thread recursive init = no-op), resulting in null. This caused NullPointerException: serializer is null in Utils.parse() and GatewayAddressCache.

Fix

1. Uniform static getter pattern

Every accessor is now accessed via a short private static getter method — no fields, no <clinit> involvement:

// Before — triggers initializeAllAccessors() during <clinit>
private static final FeedResponseAccessor feedResponseAccessor =
    ImplementationBridgeHelpers.FeedResponseHelper.getFeedResponseAccessor();

// After — no <clinit> involvement, accessor resolved lazily on first use
private static FeedResponseAccessor feedResponseAccessor() {
    return ImplementationBridgeHelpers.FeedResponseHelper.getFeedResponseAccessor();
}

The accessor is already cached inside ImplementationBridgeHelpers via AtomicReference — the getter adds one volatile read (~1ns) per call, negligible vs actual Cosmos operations.

2. Break circular <clinit> for DEFAULT_SERIALIZER

// Before — circular <clinit> dependency
public final static CosmosItemSerializer DEFAULT_SERIALIZER =
    DefaultCosmosItemSerializer.DEFAULT_SERIALIZER;

// After — creates instance directly, no cross-class <clinit> dependency
public final static CosmosItemSerializer DEFAULT_SERIALIZER =
    new DefaultCosmosItemSerializer(
        Utils.getDocumentObjectMapper(Configs.getItemSerializationInclusionMode()));

Scope of changes

Category Count Description
Static final accessor fields removed 49 All private static final XxxAccessor and private final XxxAccessor fields in consuming classes
Static getter methods added ~80 private static XxxAccessor xxx() { return getXxxAccessor(); }
Inline accessor calls replaced ~200 All ImplementationBridgeHelpers.XxxHelper.getXxxAccessor().method() inline calls converted to xxx().method()
Files changed 79 Across com.azure.cosmos
Missing static { initialize(); } blocks added 3 CosmosRequestContext, CosmosOperationDetails, CosmosDiagnosticsContext
Accessor rename fix 1 getCosmosAsyncClientAccessor()getCosmosDiagnosticsThresholdsAccessor() in CosmosDiagnosticsThresholdsHelper
DEFAULT_SERIALIZER circular <clinit> fix 1 CosmosItemSerializer creates instance directly instead of cross-referencing DefaultCosmosItemSerializer. Removed dead DefaultCosmosItemSerializer.DEFAULT_SERIALIZER field and its serializationInclusionModeAwareObjectMapper.

Documented exceptions (not converted to static getters)

  • DefaultCosmosItemSerializer — instance field intentionally preserved. This class is instantiated during CosmosItemSerializer.<clinit>; the instance field initialization triggers the initializeAllAccessors() fallback at exactly the right time (after super(), before constructor body).
  • HttpClient.java — Java interface; Java 8 doesn't support private static interface methods. Uses method-local variable.
  • Utils.javaensureItemSerializerAccessor() uses a CAS/caching pattern with AtomicReference. Preserved as-is.
  • BridgeInternal/ModelBridgeInternal/UtilBridgeInternal — accessor registration sources, not consumers.

Why this approach

Approach Problem
Class.forName() in getters Can't force accessor registration during recursive <clinit> (JLS §12.4.2 no-op). Also reverts Fabian's intentional removal in PR #28912
initializeAllAccessors() from <clinit> The root cause — loads 40+ classes, creates circular dependency
CosmosClientBuilder.static{} Doesn't cover customer code touching model classes before building a client
Static block in ImplementationBridgeHelpers Still deadlocks — <clinit> entered from within other classes' <clinit> via static final accessor fields (demo PR #48697)
Static getter methods Zero classloader risk, zero <clinit> involvement, zero recursive edge cases

Tests

  1. concurrentAccessorInitializationShouldNotDeadlock (invocationCount=5) — forks fresh child JVMs, 12 threads via CyclicBarrier concurrently triggering <clinit> of 6 high-risk classes. 30s timeout catches deadlock.

  2. allAccessorClassesMustHaveStaticInitializerBlock — forked JVM iterates all *Helper inner classes, calls each getter, verifies accessor is non-null. Catches missing static { initialize(); } blocks.

  3. noStaticOrInstanceAccessorFieldsInConsumingClassesreflection-based enforcement. Collects all Accessor interface types from ImplementationBridgeHelpers, scans every class for static or final fields of those types. Catches reintroduction of dangerous patterns regardless of source formatting.

  4. accessorInitialization — existing test, validates explicit initializeAllAccessors() bootstrap path.

@github-actions github-actions bot added azure-spring All azure-spring related issues Cosmos labels Apr 3, 2026
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 3, 2026

Closing — bridge classes don't allow adding new methods. Proceeding with #48667 (Class.forName with explicit classloader).

@jeet1995 jeet1995 closed this Apr 3, 2026
@jeet1995 jeet1995 reopened this Apr 3, 2026
@jeet1995 jeet1995 force-pushed the fix/clinit-deadlock-bridge-methods branch from e57066d to 66afd43 Compare April 4, 2026 00:05
@jeet1995 jeet1995 changed the title Fix JVM <clinit> deadlock using targeted bridge methods (alternative to #48667) Fix JVM <clinit> deadlock by removing static final accessor fields (alternative to #48667) Apr 4, 2026
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 4, 2026

/azp run java - cosmos - ci

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 4, 2026

/azp run java - cosmos - tests

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 4, 2026

/azp run java - cosmos - kafka

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 4, 2026

/azp run java - cosmos - spark

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 4, 2026

/azp run java - spring - ci

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

4 similar comments
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995 jeet1995 marked this pull request as ready for review April 5, 2026 00:30
Copilot AI review requested due to automatic review settings April 5, 2026 00:30
@jeet1995 jeet1995 changed the title Fix JVM <clinit> deadlock by removing static final accessor fields (alternative to #48667) Fix JVM <clinit> deadlock by removing static final accessor fields Apr 5, 2026
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 5, 2026

/azp run java - cosmos - ci

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 5, 2026

/azp run java - cosmos - tests

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - spark

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - spring - ci

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

4 similar comments
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995 jeet1995 force-pushed the fix/clinit-deadlock-bridge-methods branch from c8c4732 to 5df3405 Compare April 8, 2026 20:54
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - ci

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - tests

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - kafka

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - spark

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - spring - ci

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

4 similar comments
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Replace all static final accessor fields and inline ImplementationBridgeHelpers
calls with uniform private static getter methods across 78 files. This eliminates
<clinit>-time class loading that caused permanent deadlocks under concurrent
class initialization (JLS 12.4.2).

Also fix CosmosItemSerializer.DEFAULT_SERIALIZER circular <clinit> dependency —
create the instance directly instead of cross-referencing
DefaultCosmosItemSerializer.DEFAULT_SERIALIZER which is null during recursive
same-thread <clinit>.

Fixes: Azure#48622, Azure#48585

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 force-pushed the fix/clinit-deadlock-bridge-methods branch from 5df3405 to 2c38b75 Compare April 8, 2026 21:23
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - ci

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - tests

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - kafka

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - cosmos - spark

@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 8, 2026

/azp run java - spring - ci

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

4 similar comments
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

azure-spring All azure-spring related issues Cosmos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants